skip to main content


Search for: All records

Creators/Authors contains: "Zhang, Amy"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. When groups of people are tasked with making a judgment, the issue of uncertainty often arises. Existing methods to reduce uncertainty typically focus on iteratively improving specificity in the overall task instruction. However, uncertainty can arise from multiple sources, such as ambiguity of the item being judged due to limited context, or disagreements among the participants due to different perspectives and an under-specified task. A one-size-fits-all intervention may be ineffective if it is not targeted to the right source of uncertainty. In this paper we introduce a new workflow, Judgment Sieve, to reduce uncertainty in tasks involving group judgment in a targeted manner. By utilizing measurements that separate different sources of uncertainty during an initial round of judgment elicitation, we can then select a targeted intervention adding context or deliberation to most effectively reduce uncertainty on each item being judged. We test our approach on two tasks: rating word pair similarity and toxicity of online comments, showing that targeted interventions reduced uncertainty for the most uncertain cases. In the top 10% of cases, we saw an ambiguity reduction of 21.4% and 25.7%, and a disagreement reduction of 22.2% and 11.2% for the two tasks respectively. We also found through a simulation that our targeted approach reduced the average uncertainty scores for both sources of uncertainty as opposed to uniform approaches where reductions in average uncertainty from one source came with an increase for the other. 
    more » « less
    Free, publicly-accessible full text available September 28, 2024
  2. To investigate the well-observed racial disparities in computer vision systems that analyze images of humans, researchers have turned to skin tone as a more objective annotation than race metadata for fairness performance evaluations. However, the current state of skin tone annotation procedures is highly varied. For instance, researchers use a range of untested scales and skin tone categories, have unclear annotation procedures, and provide inadequate analyses of uncertainty. In addition, little attention is paid to the positionality of the humans involved in the annotation process—both designers and annotators alike—and the historical and sociological context of skin tone in the United States. Our work is the first to investigate the skin tone annotation process as a sociotechnical project. We surveyed recent skin tone annotation procedures and conducted annotation experiments to examine how subjective understandings of skin tone are embedded in skin tone annotation procedures. Our systematic literature review revealed the uninterrogated association between skin tone and race and the limited effort to analyze annotator uncertainty in current procedures for skin tone annotation in computer vision evaluation. Our experiments demonstrated that design decisions in the annotation procedure such as the order in which the skin tone scale is presented or additional context in the image (i.e., presence of a face) significantly affected the resulting inter-annotator agreement and individual uncertainty of skin tone annotations. We call for greater reflexivity in the design, analysis, and documentation of procedures for evaluation using skin tone. 
    more » « less
    Free, publicly-accessible full text available June 1, 2024
  3. Free, publicly-accessible full text available May 1, 2024
  4. This dataset contains 710 GitHub-hosted OSS projects, which contain a governance file in the root directory of the project. It also contains commits, issues, and comments on each project.

     
    more » « less
  5. Open-source Software (OSS) has become a valuable resource in both industry and academia over the last few decades. Despite the innovative structures they develop to support the projects, OSS projects and their communities have complex needs and face risks such as getting abandoned. To manage the internal social dynamics and community evolution, OSS developer communities have started relying on written governance documents that assign roles and responsibilities to different community actors. To facilitate the study of the impact and effectiveness of formal governance documents on OSS projects and communities, we present a longitudinal dataset of 710 GitHub-hosted OSS projects with GOVERNANCE.MD governance files. This dataset includes all commits made to the repository, all issues and comments created on GitHub, and all revisions made to the governance file. We hope its availability will foster more research interest in studying how OSS communities govern their projects and the impact of governance files on communities. 
    more » « less
  6. Making online social communities ‘better’ is a challenging undertaking, as online communities are extraordinarily varied in their size, topical focus, and governance. As such, what is valued by one community may not be valued by another.However, community values are challenging to measure as they are rarely explicitly stated.In this work, we measure community values through the first large-scale survey of community values, including 2,769 reddit users in 2,151 unique subreddits. Through a combination of survey responses and a quantitative analysis of publicly available reddit data, we characterize how these values vary within and across communities.Amongst other findings, we show that community members disagree about how safe their communities are, that longstanding communities place 30.1% more importance on trustworthiness than newer communities, and that community moderators want their communities to be 56.7% less democratic than non-moderator community members.These findings have important implications, including suggesting that care must be taken to protect vulnerable community members, and that participatory governance strategies may be difficult to implement.Accurate and scalable modeling of community values enables research and governance which is tuned to each community's different values. To this end, we demonstrate that a small number of automatically quantifiable features capture a significant yet limited amount of the variation in values between communities with a ROC AUC of 0.667 on a binary classification task.However, substantial variation remains, and modeling community values remains an important topic for future work.We make our models and data public to inform community design and governance. 
    more » « less
  7. Generalization is a central challenge for the deployment of reinforcement learning (RL) systems in the real world. In this paper, we show that the sequential structure of the RL problem necessitates new approaches to generalization beyond the well-studied techniques used in supervised learning. While supervised learning methods can generalize effectively without explicitly accounting for epistemic uncertainty, we describe why appropriate uncertainty handling can actually be essential in RL. We show that generalization to unseen test conditions from a limited number of training conditions induces a kind of implicit partial observability, effectively turning even fully-observed MDPs into POMDPs. Informed by this observation, we recast the problem of generalization in RL as solving the induced partially observed Markov decision process, which we call the epistemic POMDP. We demonstrate the failure modes of algorithms that do not appropriately handle this partial observability, and suggest a simple ensemble-based technique for approximately solving the partially observed problem. Empirically, we demonstrate that our simple algorithm derived from the epistemic POMDP achieves significant gains in generalization over current methods on the Procgen benchmark suite. 
    more » « less
  8. null (Ed.)